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Abstract 

Tms  work  examines  possible  sources  of  training  difficulty  encountered 
by  learners  of  speech  spectrogram  reading.  Such  difficulty  has  been 
attributed  to  the  context-dependent  nature  of  the  visual  segmentation 
of  spectrogram  patterns  (Liberman  et  al.  1968),  and  suggestions  by 
researchers  of  other  difficult  skills  (Biederman  &  Shiffrar,  1983)  have 
also  implicated  visual  segmentation.  In  both  cases,  the  discriminations 
necessary  to  distinguish  important  parts  can  be  easily  made  once 
identified,  but  are  enormously  difficult  to  discover.  The  experiments 
presented  here  used  a  pseudo-spectrogram  reading  task  which  varied 
the  segmentation  rules  subjects  were  required  to  discover.  Experiment 
1  found  that  considerable  learning  difficulty  could  be  produced  by  this 
task,  but  confounded  the  source  of  that  difficulty  among  several 
factors.  The  second  experiment  attempted  to  identify  the  sources  of  the 
difficulty.  Segmentation  was  found  to  contribute  significantly.  The 
salience  of  the  important  cues,  and,  potentially,  the  demands  of  the 
learning  task  were  also  found  to  increase  the  difficulty  of  discovering 
important  visual  distinctions.  These  results  are  discussed  with  respect 
to  the  skill  of  spectrogram  reading  and  theories  of  perceptual  attention 
learning.  - - - - 
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Difficulty  in  Learning  to  Read  Speech  Spectrograms: 

The  Role  of  Visual  Segmentation 

When  acquiring  a  new  perceptual  skill,  a  learner  is  usually 
faced  with  the  problem  of  learning  to  recognize  new  features  and 
discovering  which  combinations  of  features  form  meaningful  patterns. 
In  X-ray  reading,  for  example,  a  student  must  learn  which  features 
indicate  normal  tissue  and  which  indicate  diseased  tissue.  Such 
learning  is  cognitive:  the  visual  system  breaks  up  the  visual  array  into 
parts  and  recognition  responses  occur  to  learned  features,  but 
cognitive  processing  and  training  are  required  to  make  decisions  about 
which  parts  are  important  and  how  they  combine  to  form  higher-level 
patterns. 

Theories  of  perceptual  learning  have  characterized  this  cognitive 
processing  as  an  hypothesis-and-test  procedure  (Levine,  1975: 
Trabasso  &  Bower,  1968)  which  results  in  the  building  of  pattern 
detectors  (Kahneman,  1973:  Chase  &  Simon,  1973).  More  recently, 
attention  has  focused  on  the  types  of  preferences  or  heuristics  which 
may  be  required  to  constrain  hypothesis  search  in  complex  displays 
(Michalski,  1983:  Medin,  Wattenmaker  &  Michalski,  1987).  One  type  of 
constraint  the  cognitive  system  must  make  is  where  to  draw  object 
boundaries,  i.e.,  which  parts  belong  together  as  objects.  Characteristics 
such  as  spatial  relations,  overlap,  proximity,  and  shading  differences 
may  play  a  role  in  determining  object  coherence  (Triesman,  1986).  For 
certain  perceptual  skills,  however,  such  segmentation  decisions  can 
create  difficulties.  For  example,  in  x-ray  pictures,  brightness 
corresponds  to  the  density  of  tissue  rather  than  any  reflective  property 
(Squire,  1988).  Hence,  visual  contours  and  separations  may  not 
correspond  to  organ  or  tissue  boundaries.  For  example,  if  two  organs 
of  equal  density  abut,  no  contour  will  appear  between  them.  A 
radiology  student  needs  to  learn  a  new  way  of  segmenting  an  x-ray 
picture  to  identify  the  locations  of  organs  and  other  tissue  groups. 

The  present  research  is  concerned  with  learning  difficulties 
which  may  result  when  visual  segmentation  does  not  correspond  to 
object  segmentation.  Its  focus  is  on  a  skill  which  until  recently  was 
considered  extremely  difficult  if  not  impossible  to  learn:  speech 
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spectrogram  reading.  Much  of  the  difficulty  in  spectrogram  reading  has 
been  attributed  to  problems  in  segmenting  the  display.  The  goal  of  this 
research  was.  first,  to  show  that  learning  difficulty  could  be  produced 
by  violating  segmentation  assumptions,  and.  second,  to  look  at  how 
segmentation  interacts  with  other  stimulus  and  task  variables. 

A  speech  spectrogram  is  a  graph  of  the  energy  in  different 
frequency  components  of  speech  over  a  short  sampled  time.  Its  two 
axes  represent  frequency  and  time,  and  the  darkness  of  a  small  region 
represents  the  amount  of  sound  energy  at  the  frequency  and  time 
matching  its  cooidinates.  When  real-time  spectrographic  displays  were 
first  developed,  it  was  hoped  that  people,  especially  the  hearing 
impaired,  could  be  taught  to  recognize  speech  by  seeing  it.  However, 
learning  to  identify  speech  from  this  graphical  display  has  proven  to  be 
difficult,  requiring  both  an  understanding  of  acoustic-phonetics  and 
many  hours  of  practice.  Potter.  Kopp,  and  Green  (1947),  in  one  of  the 
earliest  efforts  toward  such  training,  taught  a  group  of  subjects  to 
identify  important  acoustic  features  in  spectrograms  and  then  had 
them  try  to  communicate  with  each  other  using  a  real-time 
spectrographic  display.  They  found  that  the  time  to  learn  the  most 
common  words  spoken  by  a  single  person  increased  linearly  with 
practice,  at  the  rate  of  about  4  words  per  hour.  That  is.  prior  learning 
did  not  aid  the  learning  of  new  words.  A  similar  learning  rate  was 
found  by  Greene,  Pisoni,  and  Carrell  (1984).  who  had  naive  subjects 
learn  to  identify  spectrograms  of  50  words  made  by  a  single  speaker. 
The  subjects  began  with  four  words  and  were  gradually  given 
additional  sets  of  four  words  over  22  sessions.  After  about  13  sessions 
the  subjects  were  able  to  learn  the  new  items  with  few  errors  and 
show  a  fair  amount  of  transfer  to  a  new  list  of  words  by  the  same 
speaker  (91.3%)  and  the  original  word  list  spoken  by  a  different 
speaker  (76%).  These  studies  have  been  viewed  optimistically  as 
demonstrating  that  people  can  be  trained  to  recognize  visual  speech. 
However,  the  studies  are  limited  by  their  use  of  speech  from  a  single 
speaker,  or  by  their  focus  on  learning  of  individual  words  which  would 
not  generalize  well  to  continuous  speech. 

More  impressive  has  been  the  effort  of  Dr.  Victor  Zue,  who  has 
taught  himself  to  read  spectrograms  of  continuous  speech. 
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independent  of  speaker,  with  a  high  level  of  accuracy  (Cole,  Rudnicky, 
Zue,  &  Reddy,  1980).  Zue  systematically  studied  spectrogram  patterns 
for  one  hour  per  day  over  several  years.  This  extensive  practice  along 
with  his  expertise  in  acoustic-phonetics  has  discovered  features  and 
rules  which  enable  him  to  identify  phoneme  segments  with  an 
accuracy  of  about  85%.  The  features  Zue  uses  are  spectral  patterns 
unique  to  individual  phonemes,  but  he  augments  simple  detection  of 
these  features  with  knowledge  of  coarticulation  effects,  which  can 
distort  the  features,  and  a  knowledge  of  phonotactic  constraints  in  the 
English  language.  Zue  has  also  been  successful  in  identifying  the  rules 
he  uses  to  recognize  phonemes  and  in  teaching  others  to  use  the  rules 
to  read  spectrograms  with  much  less  practice  (40  hrs  vs  2000  hrs) 
(Cole  &  Zue.  1980). 

But  what  is  the  original  source  of  the  difficulty  which  limited 
subjects  in  early  studies  to  small  vocabularies,  and  required  2000+ 
hours  of  training  plus  acoustic-phonetic  knowledge  on  the  part  of 
Victor  Zue?  In  an  article  entitled  "Why  are  speech  spectrograms  hard 
to  read?",  Liberman  et  al  (1968)  identify  the  major  reason  for  this 
learning  difficulty  as  the  context-dependent  nature  of  the  acoustic 
signal.  How  a  sound  is  articulated,  and  hence  how  it  appears  on  a 
spectrogram,  depends  on  what  other  sounds  are  made  immediately 
before  and  after  it.  A  vowel  following  a  /d/  will  look  different  from  one 
following  a  /g/.  Context  dependency  leads  to  a  special  learning 
difficulty  because  of  the  inherent  difference  between  the  way  the  visual 
and  auditory  systems  segment  the  acoustic  pattern.  To  the  visual 
system,  a  vowel  followed  by  a  stop  consonant  appears  as  a  wide  dark 
band  beside  a  narrow  dark  band  with  a  blank  space  in  between,  i.e., 
two  distinct  objects.  However,  the  auditory  segmentation  of  those  two 
sounds  is  more  overlapping  and  blurred:  part  of  the  stop  sound  is  due 
to  the  yowel  transition.  Liberman  et  al  (1968)  saw  this  difference 
between  the  auditory  and  visual  systems  as  so  fundamental  that  they 
asserted  "no  amount  of  training  will  cause  an  appropriate  speech 
decoder  to  develop  for  a  visual  input"  (p.  131).  Victor  Zue  has  proven 
their  appraisal  wrong,  but  he  has  also  shown  that  their  analysis  of  the 
source  of  difficulty  may  be  correct:  much  of  his  ability  is  based  on  his 
knowledge  of  coarticulation  (context-dependent)  effects. 
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Why  should  context  dependency  and  its  associated  segmentation 
problem  produce  learning  difficulties?  According  to  Liberman  et  al 
(1968),  the  nature  of  the  speech  code  is  such  that  while  the  auditory 
system  has  developed  to  deal  with  its  temporal  properties,  the  visual 
system  is  not  capable  of  processing  it  in  a  spatial  layout.  Yet  Victor 
Zue’s  performance  demonstrates  that  it  can  be  accomplished.  The 
question  then  is  why.  from  a  perceptual  learning  point  of  view, 
context-dependent  features  are  difficult  to  identify.  One  suggestion 
comes  from  recent  work  by  Biederman  and  Shiffrar  (1987)  on  chick- 
sexing.  Biederman  and  Shiffrar  (1987)  demonstrated  that  for  the  skill 
of  determining  the  gender  of  day-old  chicks,  training  time  could  be 
drastically  reduced  by  identifying  non-accidental  distinguishing 
features.  Chick-sexing  reportedly  takes  several  years  of  essentially  trial 
and  error  practice  to  achieve  high  proficiency.  By  identifying  simple 
invariant  features.  Biederman  and  Shiffrar  were  able  to  reduce  these 
years  of  training  to  a  simple  rule  for  finding  a  distinguishing  contour. 
Although  they  didn’t  show  why  learning  was  originally  so  difficult. 
Biederman  and  Shilnar  hypothesized  that  the  critical  distinguishing 
features  were  obscured  by  their  small  size  and  by  being  embedded  in 
other  parts.  In  such  cases,  they  concluded,  it  is  better  to  provide 
instruction  which  points  out  the  features  than  to  hope  they  will  be 
discovered  by  the  learner. 

The  same  causes  of  difficulty  may  apply  to  the  reading  of  speech 
spectrograms.  The  context-dependent  nature  of  the  speech  signal 
causes  the  visual  system  to  break  up  the  display  in  inappropriate 
places.  Additionally,  cognitive  processes  may  be  more  likely  to  group 
certain  parts  together  into  objects  and  restrict  attention  to  these  object 
units  (Ceraso,  1985;  Kahneman,  1973).  This  may  produce  search 
difficulties  if  features  required  to  identify  one  pattern  are  spread  across 
different  objects.  An  otherwise  noticeable  distinguishing  feature  may  be 
difficult  to  discover  because  it  is  in  another  "part."  This  hypothesis  is 
examined  in  the  experiments  which  follow. 

The  question  of  interest  to  the  present  work  is  whether  the 
difficulty  of  learning  spectrogram  reading  is  produced  by  context- 
dependent  relations  among  visual  features.  To  enable  experimental 
manipulation  of  the  relations  of  interest,  pseudo-spectrograms  were 
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used.  A  computer  program  generated  these  pseudo-spectrograms  based 
on  feature  descriptions  and  interaction  rules  of  real  speech 
spectrograms  (Zue,  unpublished).  A  general  resemblance  to  actual 
spectrograms  was  maintained. 

The  patterns  used  in  the  experiments  were  composed  of  two-  or 
three-phoneme  syllables  in  a  vowel-consonant  or  consonant-vowel- 
consonant  order.  Examples  of  the  pseudo-spectrograms  used  in 
Experiment  1  are  shown  in  Figure  1.  The  consonants  used  were  the 
stop  consonants  /b/,  /p/.  /t/.  /k/,  /d/.  and  /g/.  The  vowels  used 
were  /i/  as  in  "beet,"  /u/  as  in  "boot,"  /ae/  as  in  "bat,"  /e/  as  in 
"bait."  /3/  as  in  "bought,"  and  /o/  as  in  "boat."  Vowel  patterns  were 
quite  similar  to  each  other  and  appeared  as  wide  striated  areas  with 
two  dark  formants  (FI  and  F2)  and  one  lighter  formant  (F3).  Vowels 
differed  from  each  other  by  their  width  and  the  height  of  their  three 
formants. 

The  purpose  of  the  two  experiments  described  below  is.  first,  to 
demonstrate  that  a  context-dependent  discrimination  (i.e.,  one  whose 
features  cross  an  object  boundary)  can  produce  learning  difficulty  in  a 
pseudo-spectrogram  reading  task;  and  second,  to  look  at  what 
contribution  segmentation,  as  distinguished  from  other  factors  such  as 
salience,  makes  to  that  difficulty. 

Experiment  1 

To  examine  the  difficulty  of  learning  a  context-dependent 
discrimination,  a  task  was  set  up  to  compare  the  learning  of  three 
pairs  of  consonants.  These  pairs  were  /b/-/p/,  /t/-/k/,  and  /d/-/g/. 
Because  the  objective  was  to  look  at  within-pair  discriminations, 
between-pair  discriminations  were  made  simple  by  giving  members  of 
the  same  pair  similar  widths,  but  members  of  different  pairs  very 
different  widths.  Hence,  /b/  and  /p/  were  both  very  thin,  /t/  and  /k/ 
were  both  wide  and  /d/  and  /g/  were  both  of  medium  width.  Within- 
pair  discriminations  were  of  three  types:  multiple  cue,  single  cue.  and 
single  context-dependent  cue.  The  consonants  /b/  and  /p/  differed 
from  each  other  in  texture,  shape,  and  width,  and  could  be 
distinguished  on  any  of  these  dimensions.  The  consonants  /t/  and 
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/k/  could  be  reliably  distinguished  only  by  a  single  cue.  They  had  the 
same  shape,  width,  and  texture,  but  a  different  number  of  formants. 
The  consonants  /d/  and  /g/  could  also  be  distinguished  only  by  a 
single  cue,  but  this  cue  could  not  be  found  by  looking  at  the 
consonant  pattern  itself.  The  shape,  width,  and  texture  of  /d/  and  /g/ 
were  identical  and  the  only  way  to  tell  them  apart  was  by  their 
influence  on  an  adjacent  vowel.  All  of  the  consonants,  except  /g/, 
made  the  second  and  third  formants  of  an  adjacent  vowel  curve 
slightly  downward  at  the  consonant-vowel  boundary.  The  consonant 
/g/  made  the  second  and  third  formants  curve  toward  each  other  and 
meet  at  the  consonant-vowel  boundary  (velar  pinch). 

The  prediction  for  the  experiment  was  that  the  context- 
dependent  discrimination  would  be  more  difficult  to  learn  than  either 
the  single  or  multiple  cue  discriminations. 

Method 


Subjects 

Ten  subjects  were  recruited  from  the  University  of  Pittsburgh. 
The  subjects  received  credit  towards  an  introductory  psychology  class 
and  $10  for  their  participation. 

Apparatus 

The  pseudo-spectrogram  patterns  were  shown  to  the  subjects  on 
the  high  resolution  display  screen  of  a  XEROX  1 108  computer. 
Subjects  responded  by  using  a  mouse  to  make  selections  from  a 
screen  menu.  The  computer  collected  the  subjects'  responses  and 
provided  accuracy  feedback  to  them. 

Materials 

The  pseudo-spectrogram  patterns  were  generated  by  a  computer 
program  as  screen  bitmaps.  The  patterns  were  346  X  346  pixels  and 
measured  10  cm  X  10  cm  on  the  display  screen.  The  phoneme 
patterns  were  drawn  from  descriptions  which  mapped  a  random 
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texture  of  a  particular  shade  of  grey  to  different  regions  of  the  space 
the  pattern  was  to  occupy.  The  patterns  were  drawn  as  lines  of  these 
small  texture  patterns,  the  length  of  which  was  predetermined  except 
when  a  line  bordered  a  blank  area.  In  that  case  the  ending  point  of 
the  line  was  set  to  a  random  number  within  10  pixels  (3  mm)  of  its 
predetermined  ending  point.  Texture  and  line-length  randomization 
thus  provided  a  small  amount  of  random  varicbility  in  reappearances 
of  the  same  phoneme. 

The  patterns  for  the  phonemes  /b/  and  /p/  were  thin  long  lines 
of  either  a  more  striated  (/b/)  or  more  random  f/p/)  texture.  For  the 
phonemes  /t/  and  /k/.  the  patterns  were  a  background  of  random 
texture  with  either  a  single  dark  area  (for  /k /)  or  two  dark  areas  (for 
/t/).  Because  the  descriptions  for  the  backgiound  textures  of  /t/  and 
/k/  were  identical,  the  only  reliable  way  of  distinguishing  between 
them  was  the  presence  of  the  extra  dark  area  in  /t/.  The  phonemes 
/d/  and  /g/  appeared  as  long  striated  patterns  before  a  vowel  and  a' 
short  striated  patterns  with  two  appendages  after  a  vowel,  but  because 
their  descriptions  were  identical,  the  only  reliable  way  to  distinguish 
between  them  was  by  the  convergence  or  lack  of  convergence  of  the 
formants  in  the  adjacent  vowel.  Vowel  patterns  appeared  as  a  striated 
uniform  background  with  two  dark  lower  bars  below  a  lighter  bar. 
Vowels  could  be  discriminated  by  the  amount  of  space  between  their 
formants.  When  vowel  formants  were  curved  by  the  presence  of  an 
adjacent  /g/,  only  the  center  of  the  pattern  could  be  used  to 
determine  the  real  distance  between  formants. 

Design 


Subjects  participated  in  fo’  r  one-hour  sessions  held  on 
consecutive  days  except  for  one  of  the  subjects  who  participated  in 
only  three  sessions  but  learned  all  of  the  discriminations.  The 
spectrogram  patterns  the  subjects  saw  were  all  possible  consonant- 
vowel-consonant  combinations  of  the  consonants  /b/.  /p/,  /iD  /d/, 
/g/,  /k/,  and  the  vowels  /i/,  /e/,  /ae/,  /o/.  /o/,  /u/.  The  total 
number  of  different  combinations  was  216.  Half  of  these  "words”  (108) 
were  used  in  each  session  so  that  after  four  sessions  the  subjects  saw 
each  word  pattern  only  twice.  To  control  for  the  frequency  of  seeing 
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each  phoneme,  the  words  were  blocked  into  groups  of  six  in  which 
each  consonant  appeared  once  in  prevocalic  and  postvocalic  form,  and 
each  vowel  appeared  once.  A  subject  saw  18  such  blocks  in  each 
session.  Before  each  session,  the  order  of  the  words  within  each  block 
and  the  order  of  the  blocks  within  the  session  were  randomized. 

Procedure 

Subjects  were  tested  individually.  A  subject  was  seated  in  front 
of  the  computer  and  shown  how  to  use  a  mouse  to  choose  a  letter 
response  from  a  screen  menu.  The  experimenter  then  briefly  explained 
about  spectrograms  and  told  the  subject  that  his  or  her  task  was  to 
learn  which  letters  were  represented  by  each  pattern.  It  was  made 
clear,  however,  that  the  task  was  a  visual  one.  and  the  subjects  were 
discouraged  from  using  strategies  based  on  the  sound  properties  of  the 
phonemes,  such  as  stress  or  pitch. 

When  the  experiment  began,  a  pseudo- spectrogram  pattern 
appeared  in  the  center  of  the  display  screen  and  remained  there  until 
a  response  was  given.  Immediately  after  the  pattern’s  appearance,  the 
message  'Think  about  your  answer..."  appeared  above  the  pattern  in  a 
message  box.  Because  of  program  differences,  three  of  the  subjects 
saw  this  message  on  the  screen  for  20  seconds,  while  for  the 
remaining  subjects  the  message  remained  on  the  screen  for  only  3 
seconds.  This  difference  was  not  expected  to  influence  the  results 
because  most  responses,  especially  early  in  the  experiment,  required 
more  than  20  seconds.  Next,  a  menu  appeared  on  the  screen  along 
with  the  message  "Click  on  the  first  sound  in  the  word."  The  menu 
contained  a  list  of  the  consonant  responses  and  an  example  word  in 
which  the  consonant  is  used.  After  a  subject  selected  one  of  the 
consonants,  a  vowel  menu  appeared  with  the  message  "Click  on  the 
second  sound  in  the  word."  Once  the  vowel  was  selected,  the 
consonant  menu  reappeared  for  the  third  response.  After  the  subject 
made  the  final  response,  the  program  provided  feedback.  If  all  three 
responses  were  correct,  the  message  "That’s  correct"  was  displayed  in 
the  message  box.  Otherwise,  the  message  That’s  wrong"  was  displayed 
along  with  the  correct  answer.  The  pseudo-spectrogram  pattern 
remained  on  the  screen  for  five  seconds  after  feedback  was  given.  The 
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subjects  were  allowed  to  take  a  short  break  halfway  through  the 
session. 

Shortly  after  the  beginning  and  toward  the  end  of  each  session, 
the  experimenter  turned  on  a  tape  recorder  and  asked  the  subject  to 
continue  with  the  next  six  trials  but  describe  verbally  what  he  was 
looking  at  in  the  pattern  and  how  he  decided  what  to  respond. 

Results  and  Discussion 


A  subject  was  considered  to  have  learned  a  consonant  pair  if  he 
or  she  responded  correctly  to  four  consecutive  trial  blocks  (8  problems) 
with  one  allowed  error  on  the  third  or  fourth  block.  The  learning  point 
was  taken  as  the  first  of  the  four  blocks.  Not  all  of  the  subjects  were 
able  to  learn  all  three  consonant  discriminations  within  the  allotted 
time.  Of  the  10  subjects,  9  learned  the  /b/-/p/  distinction.  6  learned 
the  /t/-/k/  distinction,  and  2  learned  the  /d/-/g/  distinction. 
McNemar’s  exact  test  for  correlated  proportions  indicated  that 
significantly  more  subjects  learned  the  /b/-/p/  distinction  than  the 
/d/-/g/  distinction  (pc. 02),  but  the  test  of  whether  more  people 
learned  the  /t/-/k/  distinction  than  learned  the  /d/-/g/  distinction 
was  not  significant  (p=.10). 

A  matched  pairs  sign  test  was  used  to  test  whether  the  learning 
points  for  the  /b/-/p/  and  /t/-/k/  distinctions  were  earlier  than  for 
the  /d/-/g/  distinction.  Unlearned  distinctions  were  considered  to 
have  a  learning  point  of  at  least  73  (i.e.,  one  greater  than  the  last 
block).  If  two  distinctions  were  unlearned,  the  learning  points  were 
considered  to  be  tied.  Using  this  procedure,  the  /b/-/p/  and  /t/-/k/ 
distinctions  were  found  to  have  been  learned  at  an  earlier  point  than 
the  /d/-/g/  distinction  (pc. 01  and  pc. 02  respectively). 

To  obtain  a  measure  of  how  much  earlier  the  single-  and 
multiple-cue  distinctions  were  learned,  it  was  necessary  for  the 
subjects  to  have  learned  to  distinguish  at  least  two  of  the  three 
consonant  pairs.  Four  subjects  failed  to  meet  this  criterion  and  were 
not  included  in  the  measure.  Of  the  six  remaining  subjects,  only  two 
learned  the  /d/-/g/  distinction.  For  the  others,  the  learning  point  was 
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estimated  as  73.  Because  this  value  underestimates  the  true  learning 
point,  the  measure  of  when  the  /d/-/g /  distinction  was  learned  is 
conservative.  Based  on  this  measure,  the  mean  number  of  trial  blocks 
required  for  subjects  to  learn  each  consonant  pair  discrimination  is 
provided  in  Table  1.  According  to  these  estimates,  the  /d/-/g/ 
distinction  appears  to  require  a  considerably  greater  amount  of 
learning  time  than  either  the  /b/-/p/  or  /t/-/k/  distinctions 
(approximately  40  additional  blocks). 


Consonant  Distinction 


Multiple  Cue  Single  Cue  Context  Cue 
/b/-/E>Z  HUM  /d/-/g/ 


Mean  20.17 

Standard  deviation  17.81 

Number  of  estimated  0 

points 


29.17  66.17 

23.26  12.17 

0  4 


Table  1:  Mean  number  of  trial  blocks  to  reach  learning 
criterion  for  each  consonant  distinction. 


These  results  suggest  that  a  context-dependent  discrimination 
can  be  difficult  to  learn.  Fewer  subjects  were  able  to  learn  the  /d/-/g/ 
discrimination  in  the  allotted  time.  The  test  on  proportion  of  learners 
for  each  distinction  showed  that  significantly  more  people  learned  the 
multiple-cue  distinction  than  the  context-dependent  one.  The 
difference  between  the  proportion  who  learned  the  single-cue 
distinction  and  the  context-dependent  one.  though  not  significant,  was 
large  (.60  vs  .20).  For  those  subjects  who  did  learn  the  context- 
dependent  discrimination  (or  who  were  optimistically  presumed  to  be 
about  to  learn  it  when  the  experiment  ended),  learning  took  longer 
than  for  either  the  multiple  cue  or  the  single  cue  discrimination. 
These  findings  suggests  that  having  to  discover  a  context-dependent 
discrimination  could  account  for  some  of  the  difficulty  encountered  in 
acquiring  the  skill  of  speech  spectrogram  reading. 
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However,  these  results  must  be  viewed  with  caution.  The 
experiment  examined  learning  of  a  realistic  and  complex  pattern,  and 
likely  confounded  several  factors  with  the  context-dependent  vs  non¬ 
context-dependent  comparison.  These  factors  must  be  ruled  out  before 
learning  difficulty  can  be  unambiguously  assigned  to  the  context- 
dependent  manner  in  which  the  stimulus  is  segmented.  One  such 
factor  is  cue  salience.  It  may  simply  have  been  harder  for  the  subjects 
to  notice  the  formant  curving  cue  than  the  other  cues.  This 
explanation  is  unlikely  given  that  8  of  the  10  subjects  mentioned  in 
their  verbal  reports  that  there  was  something  unusual  about  the 
appearance  of  the  formants  (i.e.,  that  they  were  curved  or  straight). 
Nevertheless,  salience  differences  must  be  ruled  out.  Another 
confounding  factor  is  whether  task  demands,  rather  than  segmentation 
difficulty,  made  the  /d/-/g /  distinction  difficult  to  learn.  Subjects  may 
have  noticed  the  formant  curving  cue.  but  because  they  also  were 
required  to  learn  the  identity  of  the  vowel,  may  have  tried  to  use 
formant  curving  to  di, anguish  among  the  different  vowels.  This  may 
have  "used  up"  the  cue,  making  it  unavailable  for  use  in  distinguishing 
the  consonants.  There  is  support  for  this  possibility  in  the  verbal 
reports  made  by  several  subjects  who  mentioned  the  formant  curving 
in  conjunction  with  vowel  discriminations.  These  two  possible 
alternative  explanations  are  examined  in  Experiment  2. 

Experiment  2 

In  Experiment  2,  the  goal  was  to  try  to  determine  whether  the 
learning  difficulty  observed  in  Experiment  1  was  due  to  context- 
dependent  segmentation,  to  some  other  factor  such  as  salience  or  task 
demands,  or  to  some  interaction  of  these  factors.  Segmentation,  in  this 
context,  refers  to  how  the  cognitive  system  divides  a  pattern  into 
objects.  Segmentation  was  manipulated  by  having  two  cues  occur 
within  the  same  object  or  by  splitting  them  between  two  objects. 
Salience  is  how  noticeable  the  features  are.  This  was  measured  by 
having  a  separate  group  of  subjects  circle  the  parts  in  the  spectrogram 
patterns  used  in  this  experiment.  It  was  also  controlled  for  in  the 
experimental  design  by  having  different  groups  of  subjects  learn  each 
distinguishing  cue  both  as  a  between-object  cue  and  as  a  within-object 
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cue.  Finally,  task  demands  refer  to  whether  the  subject  was  to  treat 
the  different  phonemes  as  separate  parts  in  making  a  response.  In  this 
experiment  subjects  made  only  a  single  response  to  the  whole  pattern, 
but  an  attempt  to  vary  task  demands  was  made  through  instructional 
bias. 


Method 


Materials 

The  pseudo-spectrogram  patterns  used  in  Experiment  2  were 
similar  to  those  used  in  Experiment  1,  but  to  control  for  all  of  the 
independent  variables,  several  changes  were  made.  First,  the  patterns 
consisted  of  only  two  phonemes:  a  vowel-like  pattern,  followed  by  a 
consonant-like  pattern.  The  vowel  patterns  were  either  thin  (T)  or  wide 
(W),  and  had  formants  which  were  either  straight  (S)  or  curved  (C)  and 
either  high  (H)  or  low  (L)  in  frequency  (/i/  vs  /ae/).  Consonant 
patterns  could  be  large  (L)  or  small  (S)  and  had  either  one  (O)  or  two 
(T)  formants.  Formants  appeared  as  dark  spots  on  the  large 
consonants  and  as  protrusions  on  the  small  consonants.  Figure  2 
shows  some  examples  of  these  patterns.  The  pseudo-spectrogram 
patterns  were  generated  in  the  same  way  as  those  in  Experiment  1; 
the  32  vowel- consonant  combinations  were  drawn  8  times  for  a  total  of 
256  patterns. 

To  assess  the  salience  of  the  patterns'  visual  features,  a  group 
of  15  subjects  (not  the  same  as  those  in  the  learning  task)  were  given 
a  stack  of  the  32  different  patterns  and  asked  to  circle  the  "important 
parts."  The  results  of  this  circling  task  are  given  in  Table  2.  Of 
relevance  to  the  present  experiment  is  the  finding  that  the  subjects 
circled  the  vowel  formants  an  average  of  98%  of  the  time,  while 
circling  the  consonant  formants  an  average  of  only  76%  of  the  time. 
Furthermore,  the  subjects  tended  to  circle  curved  vowel  formants  as  a 
single  part  (67%  of  the  time),  and  straight  vowel  formants  as  separate 
parts  (83%  of  the  time).  The  first  consonant  formant  was  circled  more 
often  than  the  second  (81%  vs  68%),  and  formants  in  the  large 
consonants  were  circled  more  often  than  formants  in  the  small 
consonants  (90%  vs  59%).  Hence,  some  of  the  difference  in  salience 
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between  vowel  formants  and  consonant  formants  may  be  due  to 
difficulty  seeing  the  small  consonant  formants  as  distinct  parts. 


Feature 

Proportion 

Whole  Vowel 

.13 

1st  Vowel  Formant 

.97 

2nd  Vowel  Formant 

.99 

3rd  Vowel  Formant 

.98 

All  Other  Vowel  Features 

.22 

Whole  Consonant 

.33 

1st  Consonant  Formant 

.83 

2nd  Consonant  Formant 

.69 

All  Other  Consonant  Features 

.29 

Table  2:  Proportion  of  times  a  feature  was  circled  in  part- 
circling  task. 

Design 

The  goal  of  the  experiment  was  to  assess  whether  a 
within-object  cue  would  be  learned  more  readily  than  a  between-object 
cue.  To  avoid  confounding  the  type  of  cue  (formant  curving  or  number 
of  formants)  with  the  location  of  the  cue  (within  or  between  objects), 
each  cue  type  was  learned  as  both  a  within-object  cue  and  as  a 
between-object  cue.  Because  this  could  not  be  manipulated  within 
subjects,  an  incomplete  blocks  design  was  used.  Each  subject  provided 
two  observations  from  the  2X2  (Cue  Type  X  Cue  Location)  design, 
and  a  block  of  two  subjects  with  complementary  conditions  constituted 
a  single  replication  of  the  design.  This  confounds  the  Cue  Type  X  Cue 
Location  interaction  with  subjects,  but  by  running  enough  replications, 
this  effect  could  be  analyzed  as  a  between  block  factor. 

One  additional  factor,  instruction,  was  also  included  as  a 
between  block  factor.  One  half  of  the  blocks  received  neutral 
instructions  which  asked  them  to  learn  to  associate  the  whole  pattern 
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with  a  response,  the  other  half  received  biasing  instructions  which 
asked  them  to  learn  the  half  of  the  pattern  containing  the 
within-object  cue.  The  within  and  between  block  designs  made  up  four 
conditions:  Neutral  Instructions,  Curve-Within  (NCW);  Neutral 

Instructions,  Curve-Between  (NCB);  Biased  Instructions,  Curve-Within 
(BCW);  and  Biased  Instructions,  Curve-Between  (BCB).  The 
Curve- Within/Curve-Between  distinction  refers  to  the  type  of  rules 
subjects  were  to  learn.  Table  3  shows  these  rules  for  each  condition. 


Cons. 

Condition 

(Instructions- 
Curve  location) 

Left  Pattern 

Right  Pattern 

/g/ 

/d/ 

/k/ 

/t/ 

Neutral-Within 

(NCW) 

Curved.Thin 

Straight.Thin 

Wide 

Wide 

One  formant 

Two  formants 

/g/ 

/d  / 

/k/ 

/t/ 

Neutral-Between 

(NCB) 

Curved 

Straight 

Small 

Small 

Large  and  One  formant 
Large  and  Two  formants 

/g/ 

/d/ 

/k/ 

/t/ 

Biased-Within 

(BCW) 

Curved.Thin 

Straight.Thin 

Wide 

Wide 

One  formant 

Two  formants 

/g/ 

/d/ 

/k/ 

It/ 

Biased-Between 

(BCB) 

Curved 

Straight 

Small 

Small 

Large  and  One  formant 
Large  and  Two  formants 

Table  3:  Rules  for  discriminating  patterns  in  Experiment  2 
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The  Curve-Within  groups  learned  the  formant  curving  cue  as  a 
within-object  cue  and  the  number  of  formants  cue  as  a  between-object 
cue;  ihe  Curve-Between  groups  learned  the  formant  curving  cue  as  a 
between-object  cue  and  the  number  of  formants  cue  as  a  within-object 
cue. 


Subjects  participated  in  a  single  two  hour  session.  The 
pseudo-spectrogram  patterns  the  subjects  saw  were  all  possible 
vowel-consonant  combinations  as  described  above.  To  control  for  the 
frequency  of  seeing  each  phoneme,  the  patterns  were  grouped  into 
blocks  of  eight  in  which  each  consonant  appeared  twice  and  each 
vowel  appeared  once.  Before  each  session,  the  order  of  the  patterns 
within  each  block,  and  the  order  of  the  blocks  within  the  session  were 
randomized  for  each  subject. 

Procedure 

Subjects  were  tested  individually.  Each  subject  was  seated  in 
front  of  the  computer  and  shown  how  to  use  a  mouse  to  choose  a 
letter  response  from  a  screen  menu.  Then  the  instructions  for  the 
experiment  were  displayed  on  the  screen.  Subjects  in  the  neutral 
conditions  were  told  their  task  was  to  learn  to  Identify  which  pattern 
was  displayed;  subjects  in  the  biased  conditions  were  told  to  identify 
the  left  (or  right)  pattern.  To  ensure  that  the  subjects  in  the  biased 
condition  read  the  instructions,  they  were  asked  to  identify  which  half 
(left  or  right)  of  the  pattern  they  were  to  learn.  If  they  were  incorrect, 
the  instructions  reappeared  on  the  screen. 

The  experiment  began  with  a  pseudo-spectrogram  pattern 
appearing  in  the  center  of  the  display  screen.  The  message  ’Think 
about  your  answer..."  appeared  in  a  message  box  above  the  pattern  for 
3  seconds.  Then  a  menu  appeared  on  the  screen  along  with  the 
message  "Click  on  the  first  sound  in  the  word."  The  menu  contained  a 
list  of  four  responses:  /t/,  /k/,  /d/,  and  /g/.  After  the  subject  made  a 
response,  the  program  provided  feedback.  If  the  response  was  correct, 
the  message  "That’s  correct"  was  displayed  in  the  message  box. 
Otherwise,  the  message  "That’s  wrong”  was  displayed  along  with  the 
correct  answer.  Once  feedback  was  given,  the  pseudo-spectrogram 
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pattern  remained  on  the  screen  for  10  seconds  before  being  replaced 
by  the  pattern  for  the  next  trial.  Every  32  trials,  the  subject  was 
allowed  to  take  a  short  break  before  continuing. 

After  the  session,  the  experimenter  turned  on  a  tape  recorder 
and  asked  the  subject  to  identify  8  patterns  and  describe  what  she 
looked  at  in  the  pattern  and  how  she  decided  what  to  respond. 

Subjects 

Forty-eight  introductory  psychology  students  from  the  University 
of  Pittsburgh  participated  for  course  credit.  Two  subjects,  both  from 
the  Neutral-Curve-Between  condition,  were  replaced:  one  quit  the 
session  early,  the  other  hadn’t  slept  for  48  hours  prior  to  the 
experiment  session  and  showed  no  learning.  The  remaining  subjects 
were  randomly  assigned  to  the  four  conditions  with  the  constraint  of 
obtaining  9  full  or  partial  learners  (as  described  below)  in  each 
condition. 


Results  and  Discussion 


As  in  Experiment  1,  subjects  had  considerable  difficulty  learning 
both  the  within  and  between  object  distinctions.  A  subject  was 
considered  to  have  learned  a  distinction  when  correct  responses  were 
made  on  two  consecutive  blocks  (8  problems)  with  one  allowed  error 
on  the  second  block  (two  subjects  were  also  considered  to  have  learned 
a  distinction  on  their  final  block  if  the  final  block  was  correct  and  they 
gave  the  correct  rule  for  the  distinction  in  their  post-session  interview). 
By  this  criterion,  the  48  subjects  fall  into  three  categories:  full 
learners,  non-learners,  and  partial  learners.  Full  learners  were  those 
who  learned  both  the  between  and  within  object  distinctions: 
non-learners  learned  neither  distinction;  partial  learners  were  those 
who  only  learned  one  of  the  two  distinctions.  Table  4  summarizes  how 
the  subjects  performed.  Eighteen  subjects  were  full  learners,  twelve 
were  non-learners,  and  eighteen  were  partial  learners.  Of  the  partial 
learners,  13  learned  only  the  within  rule  and  5  learned  only  the 
between  rule.  Of  the  non-learners,  one  was  from  the  NCB  condition, 
two  from  the  BCW  condition,  and  nine  from  the  NCW  condition. 
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Discriminations  learned 

NCW 

NCB 

BCW 

BCB 

Both  discriminations 

4 

9 

1 

4 

One  discrimination 

Within  rule  only 

5 

0 

7 

1 

Between  rule  only 

0 

0 

1 

4 

Neither  discrimination 

9 

1 

2 

0 

Table  4:  Discriminations  learned,  by  condition 


A  matched  pairs  sign  test  was  used  to  test  the  main  effects  of 
Cue  Location  and.  Cue  Type  for  those  subjects  who  were  full  or  partial 
learners.  For  partial  learners,  the  learning  point  of  the  unlearned 
distinction  was  considered  to  be  at  least  17  (the  last  trial  block  plus 
one).  By  this  test,  the  main  effect  of  Cue  Location  was  not  significant 
(z=1.39.  p<.09).  but  the  main  effect  of  Cue  Type  was  significant 
(z=4. 18,  pc. 001).  The  subjects  learned  the  formant  curving  cue  before 
the  number  of  formants  cue  significantly  more  often  than  they  learned 
them  in  the  reverse  order.  To  test  the  interaction  of  Cue  Type  X  Cue 
Location,  each  subject’s  performance  was  categorized  according  to  its 
sign.  A  chi-square  test  of  independence  revealed  that  the  interaction 
was  significant  (xa(2)=  19.35,  pc. 001).  Formant  curving  was  learned  first 
as  a  within-object  cue  just  as  often  as  it  was  learned  first  as  a 
between-objects  cue,  but  the  number  of  formants  cue  was  learned  first 
as  a  within-object  cue  more  often  than  as  a  between-objects  cue. 

To  obtain  a  measure  of  when  the  distinctions  were  learned,  the 
learning  point  for  unlearned  distinctions  was  estimated  as  the  17th 
block.  This  value  underestimates  the  true  learning  block  and  makes 
the  measure  conservative.  Most  of  these  estimations  were  made  for  the 
between- object  distinction  when  it  involved  the  number  of  formants 
cue.  This  is  also  consistent  with  the  observation  that  an  unusually 
large  number  of  non-learners  were  found  in  the  conditions  which 
required  learning  this  distinction  (the  NCW  and  BCW  conditions). 
Making  these  estimations,  the  mean  learning  block  for  each  distinction 
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and  condition  was  calculated.  These  values  are  given  in  Table  5.  The 
measures  indicate  that  the  number  of  formants  cue  was  learned  at 
least  five  blocks  earlier  as  a  within-object  cue  than  as  a  between-object 
cue,  but  the  formant  curving  cue  was  learned  at  about  the  same  point 
for  both  locations. 


Cue  Type 

Number  of  formants 

Mean 

Standard  deviation 
Number  of  estimated  points 

Formant  curving 

Mean 

Standard  deviation 
Number  of  estimated  points 


Cue  Location 


Within 

Between 

10.28 

15.56 

5.04 

2.59 

4 

12 

8.72 

7.44 

4.23 

3.75 

1 

1 

Table  5:  Mean  number  of  trial  blocks  to  reach  learning  criterion  for  each  consonant  distinction 


These  results  suggest  that  lack  of  salience  may  play  an 
important  part  in  making  this  type  of  skill  difficult  to  learn.  The  sign 
test  demonstrates  that  the  formant  curving  cue  was  more  often  learned 
before  the  number  of  formants  cue,  and  the  estimates  of  learning 
points  shows  that  the  formant  curving  cue  was  learned  at  least  4 
blocks  earlier,  on  average.  The  cause  of  this  difference  is  likely  to  be 
cue  salience.  In  the  part  circling  task,  more  subjects  circled  the  vowel 
formants  than  the  consonant  formants,  suggesting  that  the  vowel 
formants  are  more  salient.  The  effect  of  salience,  however,  does  not 
explain  the  learning  difficulty  observed  in  the  first  experiment.  In 
Experiment  1,  number  of  formants  as  a  within-object  cue  was  learned 
sooner  and  more  often  than  the  formant  curving  cue  as  a  between- 
object  cue.  If  this  were  due  to  salience,  then  we  should  have  found 
that  the  number  of  formants  cue  was  learned  sooner  than  the  formant 
curving  cue  in  Experiment  2. 
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Nor  can  segmentation  by  itself  account  for  the  observed  learning 
difficulty.  Cue  Location  was  not  significant,  and  even  the  interaction  of 
Cue  Type  and  Cue  Location  does  not  produce  a  simple  explanation. 
Context-dependent  segmentation  does  appear  to  produce  learning 
difficulty,  but  this  effect  may  be  restricted  to  cues  of  lower  salience. 
The  chi-square  test  on  the  interaction  of  Cue  Type  and  Cue  Location 
showed  that  more  subjects  learned  the  formant  curving  cue  before  the 
number  of  formants  cue  when  the  number  of  formants  cue  was  a 
between-objects  cue.  but  when  the  number  of  formants  cue  was  a 
within-objects  cue.  the  order  of  learning  was  indifferent  to  cue  type. 
Thus,  difficulty  due  to  cue  location  was  found  for  the  less  salient 
number  of  formants  cue  but  not  for  the  more  salient  formant  curving 
cue.  However,  the  degree  of  impairment  for  less  salient  cues  appears 
to  be  substantial.  More  non-learners  (11  vs  1)  and  within-rule-only 
learners  (12  vs  1)  were  reported  in  the  conditions  which  required 
learning  the  number  of  formants  cue  as  a  between-objects  cue. 
Additionally,  the  conservative  estimate  of  learning  points  indicates  that 
this  cue  was  learned  at  least  five  blocks  later  as  a  between-  than  as  a 
within-object  cue. 

Yet  segmentation  does  not  explain  the  learning  difficulty 
observed  in  the  first  experiment.  In  Experiment  1.  the  formant  curving 
cue  as  a  between-object  cue  was  found  to  be  much  harder  to  learn 
than  the  number  of  formants  cue  as  a  within-object  cue.  This  finding 
was  not  replicated  in  the  second  experiment.  In  fact,  the  opposite  was 
found.  Neither  salience  nor  segmentation  can  account  for  this 
difference  because  neither  was  changed  between  the  two  experiments. 
The  only  major  change  was  the  learning  task. 

Presumably,  the  reason  the  formant  curving  cue  was  difficult  to 
learn  in  Experiment  1  was  the  vowel  response  required  in  that  task. 
This  was  not  manipulated  in  the  second  experiment,  so  it  is  impossible 
to  be  certain.  It  is  interesting  to  note,  however,  that  the  difficulty 
disappeared  when  the  vowel  identification  task  was  eliminated  in 
Experiment  2.  Unfortunately,  the  manipulation  of  instructional  bias  in 
this  experiment  was  too  weak  to  clarify  this  question.  Half  of  the 
subjects  were  instructed  to  Team  to  identify  the  right  [or  "left"!  hand 
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part"  of  the  pattern,  but  in  post-experiment  interviews  several  admitted 
to  ignoring  these  instructions.  Instructional  bias  did  not  significantly 
interact  with  either  Cue  Type  or  Cue  Location  (x2(2)=3.31.  p>.10, 
X2(2)=3.82.  p>.10,  respectively).  Future  research  should  determine 
whether  task  demands  cause  the  difficulty  observed  in  Experiment  1 
by  more  strongly  manipulating  task  demands  within  a  single 
experiment. 


General  Discussion 

The  two  experiments  presented  here  point  to  several  factors 
which  can  affect  the  difficulty  of  learning  to  read  speech  spectrograms. 
The  original  hypothesis,  that  learning  difficulty  was  caused  by  context- 
dependent  relations  created  by  the  way  the  visual  system  segments 
spectrogram  patterns,  has  been  shown  to  be  too  simple.  Learning 
difficulty  for  this  skill  may  be  affected  by  the  interaction  of 
segmentation  with  cue  salience  and  task  demands.  Segmentation  was 
shown  to  have  a  considerable  influence  on  difficulty,  but  this  influence 
may  be  restricted  to  less  salient  cues.  Segmentation  may  also  be 
influenced  by  the  demands  of  the  learning  task.  Although  the 
experiments  did  not  demonstrate  this,  it  is  likely  that  the  type  of 
response  required  by  the  learning  task  influences  task  difficulty.  The 
following  discussion  examines  in  more  detail  why  segmentation  might 
interact  with  these  factors. 

The  interaction  of  segmentation  with  cue  salience  can  be 
explained  by  assuming  that  whatever  learning  difficulties  are  produced 
by  segmentation  can  be  overcome  by  a  highly  salient  cue.  Salience 
has  long  been  known  to  influence  hypothesis  selection  in 
discrimination  learning  tasks  (Trabasso  &  Bower.  1968).  Highly  salient 
cues  are  likely  to  be  tried  first  as  hypotheses.  If  the  effect  of 
segmentation  is  to  make  certain  cues  less  available  for  selection  as 
hypotheses,  then  it  is  easy  to  understand  why  a  high  degree  of 
salience  would  overcome  this  effect.  This  explanation  is  supported  by 
the  results  of  the  second  experiment  reported  here,  in  which  the  mean 
learning  block  was  about  the  same  for  all  distinctions  except  for  the 
condition  when  the  less  salient  number  of  formants  cue  was  a 
between-objects  cue.  When  the  formant  curving  cue  was  a  between- 
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objects  cue.  its  highly  salient  nature  made  it  available  for  attention 
an\  way. 

Although  neither  experiment  directly  manipulated  task  demands, 
the  difference  between  the  results  of  the  two  experiments  Suggests  that 
the  type  of  response  the  subjects  were  required  to  give  was  also 
important.  In  the  first  experiment,  where  the  subjects  were  required  to 
respond  to  both  consonants  and  vowels,  they  had  difficulty  learning 
the  highly  salient  formant  curving  cue  as  a  between  object  cue.  In  the 
second  experiment,  where  subjects  made  only  a  single  response  to  the 
whole  pattern,  formant  curving  was  no  more  difficult  to  learn  as  a 
between-object  cue  than  as  a  within-object  cue.  Since  subjects  in 
Experiment  1  reported  using  formant  curving  to  distinguish  the  vowel 
responses,  it  seems  likely  that  including  the  vowel  response  made  it 
more  difficult  to  notice  the  relevance  of  the  formant  curving  to  the 
consonant  distinction,  perhaps  in  the  following  way.  A  subject  might 
select  the  cue  as  a  hypothesis  for  vowel  identification.  When  this 
hypothesis  was  disconfrmed,  the  hypothesis  may  have  become  less 
likely  to  be  selected  immediately  again.  If  the  formant  curving  cue  was 
selected  as  relevant  for  vowel  discrimination  because  of  the  way  that 
spectrograms  are  segmented  visually,  it  might  be  less  available  for  part 
of  a  consonant  discrimination.  In  the  second  experiment,  when  the 
vowel  identification  task  was  eliminated,  subjects  were  more  able  to 
learn  formant  curving  as  a  between  object  cue. 

Task  demands  may  also  have  increased  learning  difficulty  by 
reinforcing  any  existing  segmentation  biases.  If  subjects  were  required 
to  make  two  responses  to  a  pattern,  they  may  have  been  more  likely  to 
see  the  pattern  as  two  distinct  parts,  and  possibly  to  assign  one 
response  to  one  part,  and  the  other  response  to  the  remaining  part. 
This  may  have  enhanced  any  existing  bias  against  c  -ossing  part 
boundaries.  This  hypothesis  can  be  tested  only  by  future  research. 

The  main  conclusion  of  the  present  research  is  to  confirm  the 
influence  of  segmentation  on  learning  difficulty  in  speech  spectrogram 
reading.  Although  segmentation  was  not  found  to  be  the  sole 
determiner  of  such  difficulty,  in  combination  with  other  stimulus  and 
task  variables  it  appeared  to  have  a  substantial  influence.  One  way  of 
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thinking  about  the  effect  of  segmentation  is  as  a  within-object  search 
bias.  People  may  be  biased  toward  searching  within  an  object’s  part 
boundaries  (contour)  for  discriminating  features,  before  considering 
features  outside  those  boundaries.  This  bias,  however,  can  be  over¬ 
ridden  by  a  highly  salient  feature  in  another  part.  The  learning  task 
is  also  important  to  the  within-object  search  bias.  If  a  feature  can  be 
used  as  a  within-object  cue.  then  it  may  be  less  likely  to  be  considered 
as  a  between-object  cue.  Such  factors  may  have  led  the  subjects  in 
Experiment  1  to  believe  incorrectly  that  formant  curving  indicated 
vowel  identity,  and  may  have  impaired  their  ability  to  associate  it  with 
consonant  identity. 

The  existence  of  a  within-object  search  bias  is  consistent  with 
several  theories  of  visual  attention.  According  to  the  view  taken  by 
Kahneman  (1973;  Kahneman  &  Henik,  1981)  and  Ceraso  (1985). 
attention  to  a  visual  scene  is  allocated  by  object  units.  According  to 
Kahneman’s  (1973)  model  of  attention  and  perception,  preattentive 
visual  processes  divide  a  display  into  units  according  to  stimulus 
properties  and  simple  grouping  rules  (such  as  Gestalt  rules).  These 
units  are  given  figural  emphasis  (attention)  based  on  factors  such  as 
figure-ground  relations,  features  which  make  something  STAND  OUT. 
and  intention.  Units  which  receive  this  attention  are  then  matched 
against  memory  structures  to  test  for  recognition.  Visual  search 
involves  the  intentional  switching  of  figural  emphasis  from  object  to 
object,  or  the  attraction  of  figural  emphasis  based  on  a  feature  (either 
stimulus  or  response  selected)  which  distinguishes  the  target. 
According  to  the  results  of  the  experiments  presented  above,  the 
features  of  a  target  phoneme  unit  are  more  likely  to  be  considered 
than  features  of  other  phonemes,  unless  those  other  features  are 
highly  salient.  This  result  may  be  due  to  the  way  attention  is  allocated 
to  a  whole  part  unit.  If  whole  phonemes  are  attended  as  wholes,  then 
the  features  within  the  attended  phoneme  will  leceive  figural  emphasis 
and  be  further  processed  as  potential  hypotheses.  However,  if  a  highly 
salient  feature,  one  which  draws  attention  to  itself,  is  in  a  neighboring 
phoneme,  it  may  be  included  in  processing  and  may  even  be  selected 
earlier  as  a  hypothesis.  According  to  this  attention-by- parts  view,  the 
within-object  search  bias  may  be  the  result  of  normal  attention 
allocation  policy  within  the  visual  system. 
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A  within-object  search  bias  is  also  consistent  with  recent 
suggestions  that  preferences  and  heuristics  are  required  to  restrict  the 
amount  of  search  involved  in  concept  learning  (Michalski,  1983; 
Medin,  Wattenmaker  &  Michalski.  1987).  This  view  is  not  inconsistent 
with  the  attention-by-parts  hypothesis,  but  emphasizes  the  functional 
role  of  such  a  bias  in  the  learning  process.  In  complex  visual 
environments,  ordered  search  for  important  features  (even  salience 
ordered  search)  is  too  resource  consuming  to  be  viable.  Rather, 
preferences  for  certain  features  or  locations  are  required  to  restrict  the 
scope  of  search.  Restricting  the  search  for  a  discriminating  feature  to 
the  area  within  the  object  boundaries  of  a  part  is  a  sensible  heuristic. 
In  our  normal  visual  perception,  objects  are  classified  or  discriminated 
by  features  within  their  own  object  boundaries.  Only  in  certain 
artificial  environments,  such  as  speech  spectrograms  or  x-ray  pictures, 
are  context-dependent  relations  set  up  by  visual  segmentation.  In  such 
environments,  what  is  normally  a  useful  heuristic  actually  hinders 
search  rather  than  aiding  it. 

In  the  second  experiment,  what  was  observed  was  not  a 
facilitating  effect  for  a  within-object  cue,  but  an  increased  difficulty  for 
locating  a  between-object  cue.  Cues  with  low  salience  can  be  fairly 
easily  located  when  they  are  within  the  same  object,  but  when  a  low- 
salience  cue  must  be  found  in  a  nearby  object,  learning  difficulty  is 
increased,  probably  by  a  tendency  to  retry  discarded  within-object 
hypotheses.  This  result  has  important  implications  for  speech 

spectrogram  reading.  First,  it  explains  at  least  part  of  the  enormous 
difficulty  in  learning  the  skill  of  speech  spectrogram  reading.  In 

spectrogram  reading,  the  large  variability  in  the  appearance  of 

phonemes  means  that  the  salience  of  most  features  is  likely  to  be 

quite  low.  Also,  it  is  important  to  learn  spectrogram  patterns  at  the 
individual  phoneme  level.  Hence,  the  narrow  focus  induced  by  the  task 
should  be  expected  to  increase  the  within-object  search  bias  and 
impair  discovery  of  context-dependent  features. 

Some  individuals,  too,  might  be  more  affected  by  a  search  bias 
than  others.  For  some,  it  may  only  slow  down  search,  with  the  low- 
salience  context-dependent  feature  found  only  after  within-object 
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features  have  been  searched.  For  others,  it  may  mean  the  complete 
abandonment  of  search  after  a  within-object  search  has  failed.  Such 
differences  depend  on  an  individual’s  repertoire  of  strategies  and 
learning  history.  Fortunately  for  students  of  spectrogram  reading, 
Victor  Zue  has  identified  many  of  these  features,  so  they  do  not  have 
to  be  discovered  anew. 

In  most  visual  environments  and  for  most  perceptual  akffls,  a 
within-object  bias  is  helpful.  It  restricts  the  amount  of  search  required 
for  learning.  However,  for  other  environments  and  skills,  such  as 
speech  spectrogram  reading,  radiology,  and  passive  sonar  reading, 
where  visual  objects  and  real  objects  do  not  directly  correspond 
(Lesgold  et  al.  1988;  Liberman  et  al,  1968;  Smith,  1982).  it  becomes  a 
source  of  learning  difficulty.  Overcoming  such  search  biases  may  be  an 
important  part  of  learning  for  these  skills. 
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Figure  2.  Examples  of  pseudo-spectrograms  used  in  Experiment  2. 
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